Frame buffer compression for video processing devices

ABSTRACT

For compressing a video signal, a local multiscale transform is applied to a frame of the video signal to obtain coefficient blocks. The coefficients of each block are distributed into a plurality of coefficient groups, and for at least one of the groups, a common exponent is determined for encoding the coefficients of the group, and respective mantissas are determined for quantizing the coefficients of the group in combination with the common exponent. Coding data including each exponent determined for a coefficient group and the mantissas quantizing the coefficients of the group in combination with this exponent are stored in an external frame buffer.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a continuation under 35 U.S.C. §120 of U.S. patentapplication Ser. No. 12/738,067, titled “FRAME BUFFER COMPRESSION FORVIDEO PROCESSING DEVICES,” filed Apr. 14, 2010, now U.S. Pat. No.8,559,499, which is hereby incorporated by reference in its entirety.U.S. patent application Ser. No. 12/738,067 is a National Stageapplication under 35 U.S.C. §371 of International ApplicationPCT/IB2007/055379, filed on Oct. 26, 2007 and titled “FRAME BUFFERCOMPRESSION FOR VIDEO PROCESSING DEVICES.”

BACKGROUND OF THE INVENTION

The present invention relates to video processing and in particular toreal-time video processing in dedicated hardware devices.

In the design of such dedicated hardware video processing devices, it isgenerally desired to reduce the need for external memory components, andfor internal memory.

In a video processing device embodied as an application-specificintegrated circuit (ASIC) or a field-programmable gate array (FPGA),input frames are stored in a frame buffer usually located in externalmemory, because they do not fit in the device itself. For processing,several frames are loaded line by line to be stored in an internalmemory of the device, called line buffer.

FIG. 1 shows the typical data flow and storage involved in aconventional video processing device 8. The input pixels 1 received atan input port 2 are stored into a frame buffer 4, usually implemented asone or more dynamic random access memory (DRAM) chips, via a DRAMinterface 3. Then, the video processor 6 fetches lines from the DRAM 4through the DRAM interface 3, storing them temporarily in the linebuffer 5. The output 9 of processor 6 is fed to the output port 7 to betransmitted to the next device to which the video processing device 8 isconnected. All image transfers are done in raster order, i.e. each framefull line by full line, and each line of a frame pixel by pixel fromleft to right.

In such a device 8, using an external DRAM 4 is required if the videoprocessor 6 needs to process simultaneously pixels originating fromdifferent frames. This is necessary, for example, in applications suchas deinterlacing, frame rate conversion, and overdrive processing in LCDtiming controllers.

If the video processor 6 also needs to have access to pixels ofdifferent lines at the same time, a line buffer 5 of substantial sizeneeds to be present inside the device 8. Important design parametersinclude the size of the DRAM 4, the available bandwidth between thedevice 8 and the DRAM chip(s) 4, and the size of the line buffer 5.

Considering input video frames of Y lines of X pixels each, with aninput frame rate of F, the input pixel rate is X×Y×F not taking intoaccount blanking. Typical values are X=1920, Y=1080 and F=50 or 60 FPS(frames per second). Similar parameters X′, Y′ and F describe the outputframe size and frame rate. In order to output one pixel, the videoprocessor 6 needs to have simultaneous access to a context of C lines ofthe input video frames, for N different video frames. The DRAM 4 mustthen be able to store at least N frames of video, i.e. a total of X×Y×Npixels. At the DRAM interface, the pixel rate is X×Y×F pixels per secondfor writing and X×Y×N×F′ pixels per second for reading. Typical datarates are then 1 billion pixels per second, which amounts to 30 Gb/s ifa pixel is represented in RGB with 10 bits per channel. High transferrates between the device 8 and the DRAM 4 are not desirable because theymay require using a higher number of DRAM chips in parallel. The videoprocessing device (in the case of an ASIC) then needs to have a largenumber of pins to access all the DRAM chips.

The required size of the internal video buffer 5 is X×C×N pixels.Hosting such a large line buffer in an ASIC is expensive, because itincreases the die size of the ASIC, and has a negative impact on themanufacturing yield. It is thus desirable to limit as much as possiblethe size of the line buffer.

One way of reducing the size of the internal line buffer is to performsequential processing by splitting the images into tiles, instead ofworking on full frames in raster order. This is illustrated in FIG. 2.The input video frames 1 are written into DRAM 4 via the input port 2and the DRAM interface 3 like in FIG. 1. However, the lines of theframes are not read in their entirety at once. Instead, the frames aresplit horizontally into smaller vertical windows, or tiles, and thetiles are processed in succession. The gain is that the lines of theline buffer 5 have a length smaller than the full width of the videoframe, corresponding to the width of the tiles. The overall size of theline buffer 5 can then be reduced in proportion. The downside is thatthe tiles must overlap so that the output tiles can be merged withoutany boundary artifact between the tiles. This causes in increase in thedata rate in proportion to the overlapping factor, which can be of20-30%. This proportion increases with the number of tiles. In addition,the output of the video processor 6 cannot be directly sent to theoutput port 7 because it is not in the raster order, but rather in theorder of the tiles. A reordering of the pixels is necessary, and thisrequires an additional transit via the DRAM 4 between the videoprocessor 6 and the output port 7. This can also increase substantiallythe required bandwidth at the DRAM interface. The solution illustratedby FIG. 2 allows trading a reduction of the internal memory required byline buffers 5 with an increase of bandwidth to the external memory 4.

Compression techniques are another way of reducing both the requiredsize of the internal memory and the bandwidth to the external DRAMchip(s). One way of using compression to this end is illustrated in FIG.3. Between the input port 2 and the DRAM interface 3, an encoder 10compresses the input pixel sequence for storage into DRAM 4. Foroperating the video processor 6, a decoder 20 receives the compressedpixel data read from DRAM 4 to restore decompressed pixel lines writteninto the decompressed line buffer 15 which may contain several adjacentlines forming a stripe. The video processor 6 reads pixel values fromthe decompressed line buffer 15, and delivers outputs pixels 9 via theoutput port 7.

The bandwidth to or from the external DRAM chip(s) is divided by thecompression factor provided by the compression. The number/size ofexternal DRAM chip(s) can be reduced in the same factor. Applyingcompression in such a context is disclosed in US 2007/0110151 A1, wherea differential pulse code modulation (DPCM) scheme is used forcompression.

In certain known compression techniques, the RGB pixels are converted toa YUV color space, and the color channels U are V and low-pass filteredand down-sampled by a factor of 2 horizontally. The frame is then storedin what is commonly called YUV 422 format. Other color sub-samplingschemes exist, like YUV 420 or YUV 411. See, e.g., WO 2006/090334.Recovering the RGB pixels requires to first up-sample again the U and Vcolor planes, and to do the color space conversion from YUV back to RGB.In this way, the color information is simply down-sampled. For certainkinds of contents, such as video games, reducing the color resolution isa visible artifact. Such compression schemes allow compression factorsof 1.5:1, or 2:1 in the very best case.

More efficient compression schemes such as JPEG or JPEG-2000 are widelyknown. They offer a visual quality close to lossless with compressionfactor of 3 to 5. They are not adapted though, because in most casesrandom access to an image region is not possible without decompressingthe entire image. Also, it is desirable that the frame buffercompression process provides a constant bit rate (CBR) reduction factorin order to ensure that the peak bit rate for transmitting the framebuffers at a constant pixel rate is controlled.

There is a need for a new way of dealing with frame and line bufferconstraints in video processing devices. There is also a need for acompression scheme usable in such a context, which provides a goodtradeoff between compression ratio and image quality, while satisfying aCBR constraint with a fine granularity.

SUMMARY OF THE INVENTION

A method of compressing a video signal is proposed, comprising:

-   -   applying a local multiscale transform to a frame of the video        signal to obtain coefficient blocks;    -   distributing the coefficients of each block into a plurality of        coefficient groups;    -   for at least one of said groups:        -   determining a common exponent for encoding the coefficients            of said group; and        -   determining respective mantissas for quantizing the            coefficients of said group in combination with the common            exponent;    -   storing coding data including each exponent determined for a        coefficient group and the mantissas quantizing coefficients of        said group in combination with said exponent.

The image coefficients are grouped into relatively small blocks ofcoefficients (e.g. 4×4 or 8×8 coefficients) that are each representedwith the same number of bits. A coefficient block corresponds to a smallregion of the frame (e.g. 4×4 or 8×8 pixels). This allows performing adirect memory access to a compressed frame buffer with minimal overhead.

Groups of multiscale (e.g. wavelet) coefficients are represented with aglobal exponent, shared with all coefficients within the group, andindividual signed mantissas. The multiscale coefficients can generallybe positive or negative. The mantissas determined for each coefficientcan be seen as positive numbers, in which case there are associated witha sign bit, or as signed numbers. Using an exponent, a sign and amantissa for a single coefficient is the basic principle of all floatingpoint representations of numbers in computers.

The compression method affords selective access to the frame data in ascalable way. Low-definition information can be accessed separately atlow cost, and when high-definition information becomes necessary,additional and larger information can be loaded from the frame bufferoff a separate layer to refine the coarse scale pixel information. Thisis an advantage provided by using a local multiscale transform such as awavelet transform in the compression method.

Each coefficient group will generally contain coefficients correspondingto a common scale of the local multiscale transform. A particular caseis the low-pass coefficient (highest scale of the transform) that willtypically not be quantized in a mantissa-exponent representation, butcopied uncompressed in the stored coding data. For scalable access tothe coding data, it is convenient that the amount of coding data storedfor one coefficient group of a block is the same for all groupscorresponding to a given scale of the local multiscale transform.

As a complement to the above compression method, there is provided amethod of decompressing a video signal from coding data, wherein, for aframe of the video signal, the coding data include block data forrespective coefficient blocks corresponding to respective regions of theframe in a local multiscale transform. Each block comprises a pluralityof coefficient groups. The block data for each coefficient block includeexponents respectively associated with some of the coefficient groups ofsaid block and mantissas respectively associated with the coefficientsof said some of the groups. The method comprises:

-   -   reading at least part of the block data for at least one        coefficient block, the read block data including the exponent        associated with at least one coefficient group selected among        the groups of said block and the mantissas respectively        associated with the coefficients of each selected group;    -   recovering approximated values of the coefficients of each        selected group by combining the mantissas respectively        associated with said coefficients and the exponent associated        with said selected group;    -   assembling at least partially said coefficient block using the        approximated coefficient values; and    -   applying a local inverse multiscale transform to the assembled        coefficient block.

In an embodiment, the step of reading the block data for saidcoefficient block comprises selecting said at least one coefficientgroup based on a target definition for the decompressed signal of theframe.

Other aspects of the invention relate to an encoder and a decoderarranged for implementing the above compression and decompressionmethods. Such encoder and decoder can in particular have application invideo processing devices.

Such a video processing device according to the invention comprises:

-   -   an encoder for compressing a video signal in the form of coding        data for successive frames of the video signal,    -   a memory interface for storing the coding data in an external        memory and retrieving coding data from the external memory;    -   a decoder for converting retrieved coding data into a        decompressed signal; and    -   a video processor for processing the decompressed signal.

The encoder comprises:

-   -   a transforming unit for applying a local multiscale transform to        a frame of the video signal to obtain coefficient blocks;    -   a mapping unit for distributing the coefficients of each block        into a plurality of coefficient groups; and    -   a coding data generator for generating coding data including,        for at least one of said groups:        -   a common exponent for encoding the coefficients of said            group; and        -   respective mantissas for quantizing the coefficients of said            group in combination with the common exponent.

The decoder comprises:

-   -   an extraction unit for extracting the coding data for at least        one coefficient group selected among the groups of the        coefficient block;    -   a computation unit for combining the mantissas forming part of        the coding data for each selected group and the exponent forming        part of the coding data for said selected group to obtain        approximated values of the coefficients of said selected group;    -   an assembling unit for assembling at least partially said        coefficient block using the approximated coefficient values; and    -   a transforming unit for applying a local inverse multiscale        transform to the assembled coefficient block.

When the compressed video signal is available according to luma andchroma channels (initially or after a change of color coordinatesystem), the coding data generated from the signal component of the lumachannel can be allocated more bits than the coding data generated fromthe signal component of each chroma channel. This makes it possible tooptimize the compression ratio while keeping a good quality of thesignal.

In order to easily access the coding data, it is convenient if theamount of coding data stored in the external memory for the groups of acoefficient block is the same for all coefficient blocks obtained from acomponent (e.g. one RGB color, or a luma or chroma channel) of the videosignal.

Each coefficient group may be assigned a respective mantissa depthparameter corresponding to a number of bits representing each mantissaforming part of the coding data for said group. The common exponent forsaid group is then determined based on the values of the coefficients ofsaid group and on said mantissa depth parameter. Each coefficient groupfor which coding data including an exponent and mantissas are generatedwill typically be made up of coefficients resulting from the localmultiscale transform at a same scale n, with 1≦n≦N, N being the numberof scales of the multiscale transform. The mantissa depth parameter ispreferably a decreasing function of the scale index n, which optimizesthe compression ratio since the fine-scale coefficients, i.e. with nsmall, are more numerous and perceptually less important than thecoarse-scale coefficients.

An embodiment of the video processing device further comprises adecompressed line buffer for storing the decompressed signal along astripe of consecutive lines of at least one frame. The video processoris then arranged to read the decompressed signal from the line buffer.

Alternatively, the video processing device comprises a compressed linebuffer for storing coding data transferred from the external memory fora plurality of regions of a frame spanning a stripe of lines of saidframe, and a context buffer for storing the decompressed signal in acontext portion of said frame, the context portion being included insaid stripe of lines and offset according to a pixel location addressedby the video processor. The video processor is then arranged to read thedecompressed signal from the context buffer.

It may be observed that the latter embodiment may be used with variouscompression schemes other than the one discussed above. Accordingly,another aspect of the invention relates to a video processing device,comprising:

-   -   an encoder for compressing a video signal in the form of coding        data for successive frames of the video signal;    -   a memory interface for storing the coding data in an external        memory and retrieving coding data from the external memory;    -   a compressed line buffer for storing coding data transferred        from the external memory for a plurality of regions of a frame        spanning a stripe of lines of said frame;    -   a decoder for converting coding data read in the compressed line        buffer into a decompressed signal;    -   a video processor for processing the decompressed signal; and    -   a context buffer for storing the decompressed signal of a        context portion of said frame, the context portion being        included in said stripe of lines and offset according to a pixel        location addressed by the video processor.

Such an embodiment makes it possible for the compression to reduce notonly the size of the external frame buffer but also that of the internalline buffer of the device. Only a small context portion needs to bestored explicitly in the decompressed form.

The decoder may be arranged to update the content of the context bufferas the video processor proceeds along a line of pixels of a frame. To doso, it deletes at least one column of pixels on one side of the contextportion and adds, on the opposite side of the context portion, at leastone other column of decompressed pixels obtained based on coding dataretrieved from the compressed line buffer for selected regions coveringsaid other column of decompressed pixels.

When the coding data represent coefficients of a local multiscaletransform, the compressed line buffer may have a first layer forreceiving coding data representing first coefficients of at least onefirst scale for said plurality of regions spanning the stripe of lines,and at least one second layer for receiving coding data representingsecond coefficients of at least one second scale finer than said firstscale for some of the regions of said plurality of regions spanning anarrower stripe of the frame. The decoder is then arranged to generatethe decompressed signal of the context portion by extracting coding datafrom both the first and second layers of the compressed line buffer.

BRIEF DESCRIPTION OF THE DRAWINGS

FIGS. 1-3, discussed above, are block diagrams of conventional videoprocessing devices.

FIG. 4 is a block diagram of a video processing device according to anembodiment of the present invention.

FIG. 5 is a diagram illustrating a way of organizing line and contextbuffers in an embodiment of the invention.

FIG. 6 is a diagram illustrating a way of distributing and representingmultiscale image coefficients in an embodiment of the invention.

FIGS. 7 and 8 are block diagrams of exemplary encoder and decoderaccording to the invention.

FIG. 9 shows diagrams illustrating the mapping of coefficients of ablock onto coefficient groups in a specific example.

FIGS. 10 and 11 are diagrams illustrating the correspondence betweenpixel regions in a video frame and coefficient blocks resulting from anexemplary local multiscale transform.

DESCRIPTION OF PREFERRED EMBODIMENTS

Compression can be used for reducing the need for internal memory insidea video processing device 8 as discussed in the introduction. This isillustrated in FIG. 4. The line information is transferred in acompressed form from the DRAM 4 to be stored into a compressed linebuffer 25 whose size, compared to the decompressed line buffer 15 ofFIG. 3, is reduced by the compression factor. The decoder 20decompresses on-the-fly pixels from the line buffer 25 to storedecompressed pixels in a small-sized context buffer 30.

On-the-fly decompression of the context portion is performed as thevideo processor 6 is proceeding along a line of the current outputframe. FIG. 5 illustrates the operation of decoder 20 in the videoprocessing device 8 of FIG. 4. The compressed line buffer 25 containscoding data corresponding to a horizontal stripe 51 of pixels. As anexample, the uncompressed pixels are each made of 30 bits in RGBrepresentation and the compression factor is 2:1, so that the number ofbits per pixel in the compressed state is 15.

The video processor 6 runs along the pixel frames in raster order. At agiven point, it is processing a pixel of coordinates (x, y). Stripe 51covers pixels useful for processing all pixels of coordinates (x′, y)where x′ covers the width of the image. When processing pixel (x, y),the video processor 6 needs access to a context of decompressed pixels52. In the example considered here, the context portion is a rectangle[x−w; x+w]×[y−h; y+h], where w and h are the half-width and thehalf-height of the context. The decompressed pixels of the contextportion 52 are maintained in a separate storage area, namely the contextbuffer 30. The decompressed pixel context is much narrower than the fullline buffer. It is computed from a corresponding compressed context 55which is part of the stripe 51 stored in the line buffer 25. In theexample, the context of compressed pixels 55 is a rectangle [x−W;x+W]×[y−H; y+H], with W≧w and H≧h. So the height of stripe 51 must besufficient to include 2H lines. When turning to the next pixel to beprocessed, at (x+1, y), the context of decompressed pixels 52 is updatedas follows: the leftmost column is dropped, and an additional column 53of new decompressed pixels is computed at added as the new rightmostcolumn of the context portion. This column of pixels 53 can be derivedfrom a small set of compressed pixel coefficients located at 54 in thestripe 51 stored in line buffer 25. Depending on the needs of the videoprocessing architecture using the context of decompressed pixels 52, theshape of the context may differ from the above simple example. It may benot centered around the current pixel, but more generally offset inaccordance with the pixel location x, y. For example it can be arectangle [x−w; x+w′]×[y−h; y+h′]. It may be of non-rectangular shape,or even non-connected (e.g. several distinct rectangles). In the case ofnon-rectangular shapes, the context of decompressed pixels 52 may beupdated by dropping one or more columns of pixels and adding also one ormore columns of decompressed pixels. For simplicity of the descriptionhowever, the simpler case of a centered rectangular context isexemplified in the drawings.

The device shown in FIG. 4 can make use of various compression schemes,for example the YUV 422, YUV 420 or YUV 411 schemes mentionedpreviously. It can also be used advantageously with the newcompression/decompression scheme described below.

In the first step of the compression process, the encoder 10 applies amultiscale transform to the pixels of the current frame. In thefollowing, this multiscale transform is a wavelet transform. Alow-complexity transform such as a Haar or 5-3 Daubechies wavelettransform can in particular be used. The transform is performed with apredefined number of scales. The transform is assumed to map integers tointegers and is performed in-place using lifting steps.

Through the multiscale wavelet transform, a correspondence isestablished between regions of a current frame and blocks of transformcoefficients. In certain embodiments, the correspondence may beone-to-one between image regions and coefficient blocks, but this is notalways the case.

For example, in the above-mentioned case of a wavelet transformperformed in-place, the correspondence may be as illustrated in FIGS.10-11. In FIG. 10, A1 denotes an image made of pixels I[x, y] where theinteger indices x and y are in the intervals 0≦x<X and 0≦y<Y. Thetransform is conventionally arranged so that the transform of the wholeimage contains the same number of coefficients as the number of pixelsin the image, and that the transform coefficients are indexed in thesame way. The transform coefficients C[x, y] are then defined for thesame values of x and y. In addition, the transform is local in the sensethat a block A2 of coefficients C[x, y] for p×2^(N)≦x<(p+1)×2^(N) andq×2^(N)≦y<(q+1)×2^(N) can be computed from pixels with the same indices,and pixels located in a vicinity depicted by the hatched area A3. In theillustrated example, N=3, the blocks being made of 8×8=64 coefficients.So in this case, coefficient block A2 corresponds to region A2∪A3 in theimage.

FIG. 11 represents the array of transform coefficients B1, having thesame size as the original image array. The inverse transform is alsolocal, in the sense that, in order to compute a pixel value at x, y(depicted by the black dot B2), a limited number coefficient blockslocated around this pixel (B3) are needed to perform the reconstructionof the pixel value. These blocks (B3) are those corresponding to theimage regions including pixel B2.

Many video processing applications do not need a random access with agranularity of a single pixel, but of a cluster of pixels, or a clusterof pixels moving in raster order, reconstructed with a pipelined waveletreconstruction, so that the apparently poor ratio (volume of data neededto reconstruct a single pixel value) is in practice much more favorable.

FIG. 6 illustrates a mapping scheme applied to the transformcoefficients. The coefficients 61 resulting from the multiscaletransform form an image that is split into several blocks ofcoefficients 62, each corresponding to a small region of the currentframe. Each block of coefficients 62 is coded with a predefined numberof bits as follows. A block of coefficients 62 is split into severalgroups of coefficients 63, 64, etc. Usually, all coefficients within agiven group have the same nature (same type of coefficient, same dynamicrange). For each block 62, a special group 63 is the one containing thelow-pass coefficient of the multiscale transform. This low-passcoefficient is represented and stored with full precision 65. Each ofthe other groups of coefficients 64 is quantized at 66 with a so-calledglobal exponent floating point (FP) representation.

A possible structure of the encoder 10 is illustrated in FIG. 7. Theframes received from the input port 2 are provided to a transformingunit 70, in this case a wavelet transform unit, which processes them inthe raster order to generate the multiscale coefficients mapped ontogroups of coefficients by a unit 71 as outlined above.

For each group of coefficients {c₁, . . . , c_(p)}, a global exponentrepresentation is built. Namely each coefficient c_(i) is approximatedas:c _(i)≈2^(e) ·m _(i)  (1)where e is an exponent common for all coefficients within the group, andthe numbers m_(i) designate respective mantissas for the coefficientsc_(i).

The operations of the coding data generator 72-74 are sequenced asfollows. A module 72 computes a global exponent e from the inputcoefficients {c₁, . . . , c_(p)} of a group, as received from themapping unit 71. Based on this global exponent e, an adaptivequantization is applied to the coefficients c₁, . . . , c_(p) by thequantization module 73 to compute the mantissas m₁, . . . , m_(p). Thesemantissas m₁, . . . , m_(p) and the exponent e are then assembledtogether in a bit packing unit 74 to produce a compressed data unit ofpredetermined size.

The structure of the corresponding decoder 20 is illustrated in FIG. 8.The compressed data units are extracted by a bit unpacking unit 81 whichrecovers the mantissas m₁, . . . , m_(p) and the exponent e for eachgroup. These are used to “dequantize” reconstructed coefficients {tildeover (c)}_(i) with a formula identical or similar to (1) in thecomputation unit 82. The reconstructed groups of coefficients {{tildeover (c)}₁, . . . , {tilde over (c)}_(p)} are assembled intoreconstructed coefficient blocks and into images of reconstructedcoefficients by the group mapping unit 83. The decompressed imageportion is then computed by the transforming unit 84 by applying theinverse wavelet transform to the reconstructed coefficients.

A parameter of the compression scheme is the mantissa depth, i.e. thenumber of bits d on which the mantissas are represented. By way ofexample, d=4. The mantissa depth parameter is defined for each group ofcoefficients and it is normally the same for all groups made up ofcoefficients of a same scale.

In an exemplary embodiment, the exponent e for a group is computed bymodule 72 as follows. The absolute value of each coefficient c_(i) ofthe group is written in binary form. Then a number e′ is defined as therank of the highest order non-zero bit in all absolute values |c_(i)|.The exponent e is then defined as e=e′−d+1. For instance, if d=4, and ifthe absolute values |c_(i)| of the coefficients are 1101, 1000001 and1000, the highest order non-zero bit is in the second coefficient andcorresponds to 2⁶. Thus e′=6, and e=e′−d+1=3. The exponent value is thene=3.

Each coefficient c_(i) of the group may then be represented in module 73with a uniform quantizer of bin size 2^(e), with a 0-bin of size2^(e+1), as is customary in wavelet-based compression:m_(i)=└c_(i)/2^(e)┘ if c_(i)>0 and m_(i)=−└−c_(i)/2^(e)┘ else, where └X┘denotes the integer equal to or immediately below X. In this way, eachmantissa m_(i) is represented with a sign bit and an integer in therange [0; 2^(d)−1], encoded on d=4 bits. This is done withstraightforward binary operations by keeping the bits of rank e, e+1, .. . , e+d−1 in the positive representation of each c_(i), plus a signbit. The overall budget for storing p coefficients with a mantissa ofdepth d, and an exponent that can be represented on E bits is E+p·(1+d).

On the decoding side, the dequantization is done in unit 82 by replacingeach number by the central value of its quantization bin. So if m_(i) is0, the decoded value {tilde over (c)}_(i) is 0. Otherwise if m_(i)>0,{tilde over (c)}_(i)=2^(e)·(m_(i)+½), and if m_(i)<0, then {tilde over(c)}_(i)=2^(e)·(m_(i)−½).

Alternatively, the quantization model can be truly uniform instead ofhaving a double 0-bin. A coefficient c_(i) is represented by a number2^(e)·m_(i), where m_(i)=[c_(i)/2^(e)] with [X] denoting the integerclosest to X. In this case, the dequantization is simpler: {tilde over(c)}_(i)=m_(i)·2^(e).

By way of example, the compression process uses an N-scale wavelettransform performed “in-place” with an integer lifting scheme. Thecoefficient image is split into blocks of 2^(N)×2^(N) coefficients eachhaving the same structure. The coefficients inside a block are denotedc[i, j] where 0≦i<2^(N) and 0≦j<2^(N).

Diagram 91 in FIG. 9 illustrates how a block of these coefficients c[i,j] is arranged in the absence of reordering. The coefficients insideeach block are then grouped by scale and orientation. In the particularcase of the wavelet transform, the groups are the following:

-   -   a group G_(N,0) consisting of one low-pass coefficient c[0, 0]        at scale N;    -   for each scale n between 1 and N, a group G_(n,1) of horizontal        wavelet coefficients c[2^(n)·i+2^(n−1), 2^(n)·j], having        p=2^(2(N−n)) coefficients;    -   for each scale n between 1 and N, a group G_(n,2) of vertical        wavelet coefficients c[2^(n)·i, 2^(n)·j+2^(n−1)], having        p=2^(2(N−n)) coefficients;    -   for each scale n between 1 and N, a group G_(n,3) of diagonal        wavelet coefficients c[2^(n)·i+2^(n−1), 2^(n)·j+2^(n−1)], having        p=2^(2(N−n)) coefficients.

The corresponding groups of coefficients are displayed in diagram 92 ofFIG. 9, and the names of the groups are written in diagram 93.

As illustrated in FIG. 9 for N=3, the groups are advantageouslyreorganized to have more homogeneous sizes. For example, groups G_(N,1),G_(N,2) and G_(N,3) all have a single coefficient, and are reorganizedas one group G_(N,1+2+3)=G_(N,1)∪G_(N,2)∪G_(N,3). Conversely, groupsG_(n,k) for smaller n can have 16 or 64 coefficients, and can be splitinto smaller groups of 2×2 or 4×4 coefficients. Diagrams 94 and 95 inFIG. 9 shows how the coefficients are grouped in the case where N=3:

-   -   groups G_(3,1), G_(3,2) and G_(3,3) are grouped together into        G_(3,1+2+3)=G_(3,1)∪G_(3,2)∪G_(3,3);    -   groups G_(1,1), G_(1,2), and G_(1,3) are each split into four        smaller groups, i.e. G_(1,1) is split into G_(1,1,1), G_(1,1,2),        G_(1,1,3) and G_(1,1,4), etc.

In this example, the coefficients can be encoded with the bit budgetshown in Table 1.

TABLE 1 Bit budget summary for a block of 8 × 8 coefficients, case 1.Group Number Size Exponent Mantissa (d) Sign Total G_(3,0) 1 1 13 13G_(3,1+2+3) 1 3 4 4 1 19 G_(2,a) 3 4 4 3 1 20/group G_(1,a,b) 12 4 4 2 116/group TOTAL 64 157 63 284/block

The mantissa budgets are 4 bits for coarse-scale coefficients, 3 bitsfor scale 2 coefficients, and 2 bits for the scale 1 coefficients. Thecompressed bit rate is 284/64=4.44 bits per pixel, i.e. a compressionfactor of 2.25:1 assuming a source data rate of 10 bits per pixel.

A lower budget compression could use fewer bits for the mantissas: 3bits at scale 3, 2 bits at scale 2 and 1 bit at scale 1, leading to thebreakdown shown in Table 2.

TABLE 2 Bit budget for a group of 8 × 8 coefficients, case 2. GroupNumber Size Exponent Mantissa (d) Sign Total G_(3,0) 1 1 13 13G_(3,1+2+3) 1 3 4 3 1 16 G_(2,a) 3 4 4 2 1 16/group G_(1,a,b) 12 4 4 1 112/group TOTAL 64 94 63 221/block

In this case, the compressed bit rate is 221/64=3.45 bits per pixel. Thecompression factor is 2.90:1.

The compression scheme has applications for reducing the volume of datato be stored in external frame stores, thus reducing (1) the sizerequirement of the external DRAM chips(s) 4 and (2) the bandwidthrequirement to this external DRAM storage. For example, the encoder 10and decoder 20 can be incorporated in a video processing device 8 havingthe architecture depicted in FIG. 3, with a decompressed line buffer 15containing a stripe of decompressed pixels accessed by the videoprocessor 6.

Furthermore, the volume of data to be stored in internal line bufferscan also be reduced, thus reducing the requirement on the size andsilicon surface of the internal line buffer. In this case, the videoprocessing device 8 may have the architecture depicted in FIG. 4, with acompressed line buffer 25 containing coding data for a stripe of pixelsand a small-sized decompressed context buffer 35 fed by the decoder 20and read by the video processor 6.

When handling color images, an embodiment converts the image in luma andchroma channels (e.g. Y, Cb and Cr), and encodes each channelseparately. The separate encoding can be performed with differentencoding parameters (for example the number of bits allocated to themantissa for a same kind of coefficient). As an illustration, the lumachannel (Y) can be encoded according to Table 1, and the chroma channels(Cb and Cr) according to Table 2. The resulting bit budget is less than12 bits per pixel, instead of the original 30 bits per pixel.

In another embodiment, the video processor 6, when working at locationx, y and at time t does not require fine scale information inside theline buffer at all vertical offsets. For example, fine scale informationis required for a total of 41 lines, from y−20 to y+20, and coarse scaleinformation only is required on 20 additional lines y+21, . . . , y+40above said 41 lines, and also on 20 additional lines y−40, . . . , y−21below said 41 lines.

In order to take advantage of this, the compressed line buffer 25 can besplit into two or more layers. For example, a coarse scale layercontains only coefficients of scale 2 or more (groups G_(3,0),G_(3,1+2+3), G_(2,a) in the example of diagram 95 in FIG. 9), and anadditional refinement layer contains coefficients of scale 1 (groupsG_(1,a,b) in FIG. 9). The compressed line buffer 25 then only needs tostore refinement layer coefficients for 40 lines instead of 80, whichprovides a substantial gain in internal memory. As a consequence, thecoefficients of the refinement layer are loaded into the compressed linebuffer later than the coarse scale layer, and discarded earlier, andtake up less space in the compressed line buffer.

Again, the coarse scale context lines may not be placed symmetricallyabove and below the fine scale context (e.g. 20 lines above and 40 linesbelow the 41 lines). The coarse scale context and the fine scale contextmay be non-symmetric, no-rectangular and even non-connected.

In another embodiment, the video processor 6 does not require the lumaand chroma information at the same processing stage, i.e. not within thesame context. Again, this makes it possible to store the chroma channelsin the compressed line buffer on less lines than, e.g., the lumachannel, to load the chroma information later into the compressed linebuffer, and to discard this information earlier than the lumainformation.

While a detailed description of exemplary embodiments of the inventionhas been given above, various alternative, modifications, andequivalents will be apparent to those skilled in the art. Therefore theabove description should not be taken as limiting the scope of theinvention which is defined by the appended claims.

What is claimed is:
 1. A method of compressing a video signal,comprising: applying a multiscale transform to a frame of the videosignal to obtain coefficient blocks; distributing the coefficients ofeach coefficient block into a plurality of coefficient groups, whereineach coefficient group is made up of coefficients from the multiscaletransform at a same scale; and for at least one of the plurality ofcoefficient groups: determining a common exponent for encoding thecoefficients of the coefficient group; and determining respectivemantissas for quantizing the coefficients of the coefficient group incombination with the common exponent; and storing coding data includingeach determined common exponent for a coefficient group and themantissas quantizing the coefficients of the coefficient group incombination with the determined common exponent.
 2. The method asclaimed in claim 1, wherein an amount of coding data stored for onecoefficient group of a coefficient block is the same for all coefficientgroups corresponding to a given scale of the multiscale transform. 3.The method as claimed in claim 1, wherein one of the coefficient groupsof each coefficient block is made of a low-pass coefficient which isdirectly included in the stored coding data.
 4. The method as claimed inclaim 1, wherein an amount of coding data stored for the coefficientgroups of a coefficient block is the same for all blocks obtained from acomponent of the video signal.
 5. An encoder for compressing a videosignal, comprising: a transforming unit for applying a multiscaletransform to a frame of the video signal to obtain coefficient blocks; amapping unit for distributing the coefficients of each coefficient blockinto a plurality of coefficient groups, wherein each coefficient groupis made up of coefficients from the multiscale transform at a samescale; a coding data generator for generating coding data including, forat least one of the coefficient groups: a common exponent for encodingthe coefficients of the coefficient group; and respective mantissas forquantizing the coefficients of the coefficient group in combination withthe common exponent.
 6. A method of decompressing a video signal fromcoding data, wherein for a frame of the video signal, the coding datainclude block data for respective coefficient blocks corresponding torespective regions of the frame in a multiscale transform, wherein eachcoefficient block comprises a plurality of coefficient groups, eachcoefficient group being made up of coefficients resulting from themultiscale transform at a same scale, wherein the block data for eachcoefficient block include exponents respectively associated with some ofthe coefficient groups of the coefficient block and mantissasrespectively associated with the coefficients of some of the coefficientgroups, the method comprising: reading at least part of the block datafor at least one coefficient block, the read block data including theexponent associated with at least one coefficient group selected amongthe coefficient groups of the coefficient block and the mantissasrespectively associated with the coefficients of each selectedcoefficient group; recovering encoded values of the coefficients of eachselected coefficient group by combining the mantissas respectivelyassociated with the coefficients and the exponent associated with theselected coefficient group; assembling at least partially thecoefficient block using the encoded coefficient values; and applying aninverse multiscale transform to the assembled coefficient block.
 7. Themethod as claimed in claim 6, wherein the block data for eachcoefficient block further include a low-pass coefficient read andassembled with the encoded coefficient values to obtain the assembledcoefficient block for the inverse multiscale transform.
 8. The method asclaimed in claim 6, wherein reading the block data for the coefficientblock comprises selecting the at least one coefficient group based on atarget definition for decompressing the frame of the video signal.
 9. Adecoder for decompressing a video signal from coding data, wherein, fora frame of the video signal, the coding data include block data forrespective coefficient blocks corresponding to respective regions of theframe in a multiscale transform, wherein each coefficient blockcomprises a plurality of coefficient groups, wherein the block data foreach coefficient block include exponents respectively associated withsome of the coefficient groups of the coefficient block and mantissasrespectively associated with the coefficients of some of the coefficientgroups, the decoder comprising: an extraction unit for extracting thecoding data for at least one coefficient group selected among thecoefficient groups of the coefficient block; a computation unit forcombining the mantissas respectively associated with the coefficients ofeach selected group and the exponent associated with the selected groupto obtain encoded values of the coefficients; an assembling unit forassembling at least partially the coefficient block using the encodedcoefficient values; and a transforming unit for applying a inversemultiscale transform to the assembled coefficient block.
 10. A videoprocessing device, comprising: an encoder for compressing a video signalin the form of coding data for successive frames of the video signal; amemory interface for storing the coding data in an external memory andretrieving coding data from the external memory; a decoder forconverting retrieved coding data into a decompressed signal; and a videoprocessor for processing the decompressed signal, wherein the encodercomprises: a transforming unit for applying a multiscale transform to aframe of the video signal to obtain coefficient blocks; a mapping unitfor distributing the coefficients of each coefficient block into aplurality of coefficient groups; and a coding data generator forgenerating coding data including, for at least one of said coefficientgroups: a common exponent for encoding the coefficients of thecoefficient group; and respective mantissas for quantizing thecoefficients of the coefficient group in combination with the commonexponent, and wherein the decoder comprises: an extraction unit forextracting the coding data for at least one coefficient group selectedamong the groups of a coefficient block; a computation unit forcombining the mantissas forming part of the coding data for the selectedcoefficient group and the exponent forming part of the coding data forthe selected coefficient group to obtain encoded values of thecoefficients of the selected coefficient group; an assembling unit forassembling at least partially the coefficient block using the encodedcoefficient values; and a transforming unit for applying an inversemultiscale transform to the assembled coefficient block.
 11. The deviceas claimed in claim 10, wherein each coefficient group is made up ofcoefficients resulting from the multiscale transform at a same scale.12. The device as claimed in claim 11, wherein an amount of coding datastored for one coefficient group of a coefficient block is the same forall coefficient groups corresponding to a given scale of the multiscaletransform.
 13. The device as claimed in claim 12, wherein the at leastone coefficient group is selected based on a target definition forprocessing the decompressed signal.
 14. The device as claimed in claim10, wherein the block data for each coefficient block further include alow-pass coefficient assembled with the encoded coefficient values toobtain the assembled coefficient block for the inverse multiscaletransform.
 15. The device as claimed in claim 10, further comprising aline buffer for storing the decompressed signal along a stripe ofconsecutive lines of at least one frame of the video signal, wherein thevideo processor is arranged to read the decompressed signal from theline buffer.
 16. The device as claimed in claim 10, wherein the decoderis arranged to update the content of the context buffer as the videoprocessor proceeds along a line of pixels of a frame of the videosignal, by deleting at least one column of pixels on one side of thecontext portion and adding, on an opposite side of the context portion,at least one other column of decompressed pixels obtained by the decoderbased on coding data retrieved from the line buffer for selected regionscovering the other column of decompressed pixels.
 17. The device asclaimed in claim 15, wherein the line buffer has a first layer forreceiving coding data pertaining to first coefficient groups made up ofcoefficients resulting from at least a first scale of the multiscaletransform for a plurality of regions spanning the stripe of lines, and asecond layer for receiving coding data pertaining to second coefficientgroups made up of coefficients resulting from at least a second scale ofthe multiscale transform, wherein the second scale is finer than thefirst scale for some regions of the plurality of regions spanning anarrower stripe of the frame of the video signal, and wherein thedecoder is arranged to generate the decompressed signal of the contextportion by extracting coding data from both the first and the secondlayers of the line buffer.
 18. The device as claimed in claim 10,wherein the compressed video signal corresponds to luma and chromachannels, and wherein the coding data generated from the signalcomponent of the luma channel are allocated more bits than the codingdata generated from the signal component of each chroma channel.
 19. Thedevice as claimed in claim 10, wherein an amount of coding data storedin the external memory for the coefficient groups of a coefficient blockis the same for all coefficient blocks obtained from a component of thevideo signal.
 20. The device as claimed in claim 10, wherein eachcoefficient group has a respective mantissa depth parameter assignedthereto, corresponding to a number of bits representing each mantissaforming part of the coding data for the coefficient group, and whereinthe common exponent for the coefficient group is determined based on thevalues of the coefficients of the coefficient group and on said mantissadepth parameter.
 21. The device as claimed in claim 20, wherein eachcoefficient group for which coding data including an exponent andmantissas are generated is made up of coefficients resulting from themultiscale transform at a same scale n, n being an integer scale indexbetween 1 and N and N being the number of scales of the multiscaletransform, and wherein the mantissa depth parameter is a decreasingfunction of the scale index n.
 22. A video processing device,comprising: an encoder for compressing a video signal in the form ofcoding data for successive frames of the video signal; a memoryinterface for storing the coding data in an external memory andretrieving coding data from the external memory; a line buffer forstoring coding data transferred from the external memory for a pluralityof regions of a frame of the video signal spanning a stripe of lines ofthe frame of the video signal; a decoder for converting coding data readin the line buffer into a decompressed signal; a video processor forprocessing the decompressed signal; and a context buffer for storing thedecompressed signal of a context portion of the frame of the videosignal, the context portion being included in said stripe of lines andoffset according to a pixel location addressed by the video processor.23. The device as claimed in claim 22, wherein the decoder is arrangedto update the content of the context buffer as the video processorproceeds along a line of pixels of a frame of the video signal, bydeleting at least one column of pixels on one side of the contextportion and adding, on an opposite side of the context portion, at leastone other column of decompressed pixels obtained by the decoder based oncoding data retrieved from the line buffer for selected regions coveringthe other column of decompressed pixels.
 24. The device as claimed inclaim 22, wherein the coding data represent coefficients of a multiscaletransform, wherein the line buffer has a first layer for receivingcoding data representing first coefficients of at least one first scalefor the plurality of regions spanning the stripe of lines, and a secondlayer for receiving coding data representing second coefficients of atleast one second scale finer than the first scale for some of theregions of the plurality of regions spanning a narrower stripe of theframe of the video signal, and wherein the decoder is arranged togenerate the decompressed signal of the context portion by extractingcoding data from both the first and the second layers of the linebuffer.